Interactive Predictive Analytics with Columnar Databases

نویسندگان

  • Martin Oberhofer
  • Michael Wurst
چکیده

Predictive Analytics is usually seen as highly interactive task. Paradoxically , it is still performed mostly as a batch task. This does not only limit its applicability , it also sets it apart from a task that is conceptually very close to it, namely OLAP analysis. The main reason for considering mining a batch task is the usually very high execution time on large data warehouses. While novel hardware offers the ability of highly distributed execution of predictive analytics algorithms, this level of parallelism cannot be exploited within the traditional row-based database paradigm. Columnar databases offer a solution to this problem, as the underlying datastructures lend themselves very well to parallel execution. This reduces the repsonse time for mining queries several magnitudes for some algorithms. While making mining faster and more responsive is already nice in itself, the real value of low response times is allowing completely new ways of interacting with huge data warehouses. In this arcticle we give a survey on the opportunities and challanges of interative, OLAP-like mining and on how columnar databases can support it. We exemplify these ideas on a task that is especially attractive for interactive mining, namely outlier detection in large data warehouses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Columnar Database Techniques for Creating AI Features

Recent advances with in-memory columnar database techniques have increased the performance of analytical queries on very large databases and data warehouses. At the same time, advances in artificial intelligence (AI) algorithms have increased the ability to analyze data. We use the term AI to encompass both Deep Learning (DL or neural network) and Machine Learning (ML aka Big Data analytics). O...

متن کامل

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

Managing Data for Visual Analytics: Opportunities and Challenges

The domain of Visual Analytics has emerged with a charter to support interactive exploration and analysis of large volumes of (often dynamic) data. A common feature shared by all the visual analytics applications developed so far is the reliance on ad-hoc and custom-built mechanisms to manage data: they re-implement their own in-memory databases to support real-time display and interactive feed...

متن کامل

FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data

High performance storage layer is vital for allowing interactive ad hoc SQL analytics (OLAP style) over Big Data. The paper makes a case for leveraging flash in the Big Data stack to speed up queries. State-ofthe-art Big Data layouts and algorithms are optimized for hard disks (i.e., sequential access is emphasized over random access) and result in suboptimal performance on flash given its dras...

متن کامل

A Blackboard-based Approach Towards Predictive Analytics

Significant increase in collected data for analytics and the increased complexity of the reasoning process itself have made investigative analytical tasks more challenging. These tasks are time critical and typically involve identifying and tracking multiple hypotheses; gathering evidence to validate the correct hypotheses and eliminating the incorrect ones. In this paper we specifically addres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011